{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d39937ca",
   "metadata": {},
   "source": [
    "# WebPage QA\n",
    "\n",
    "This is an example of a QA example， reference [llama_index example](https://gpt-index.readthedocs.io/en/latest/examples/data_connectors/WebPageDemo.html#using-simplewebpagereader). \n",
    "\n",
    "It works by first obtaining a list of URLs from the user, then extracting relevant information from the pages associated with those URLs.Next, a vector index is created based on this information, and finally, the program is able to answer questions using the indexed information.\n",
    "\n",
    "Before running this example, please set **OPENAI_API_KEY** environment param."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a411d62e",
   "metadata": {},
   "source": [
    "## Init GPTCache"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "9c2022a5",
   "metadata": {},
   "outputs": [],
   "source": [
    "import hashlib\n",
    "from gptcache import Cache\n",
    "from gptcache.adapter.api import init_similar_cache\n",
    "\n",
    "\n",
    "def get_hashed_name(name):\n",
    "    return hashlib.sha256(name.encode()).hexdigest()\n",
    "\n",
    "\n",
    "def init_gptcache(cache_obj: Cache, llm: str):\n",
    "    hashed_llm = get_hashed_name(llm)\n",
    "    init_similar_cache(cache_obj=cache_obj, data_dir=f\"similar_cache_{hashed_llm}\")\n",
    "\n",
    "gptcache_obj = GPTCache(init_gptcache)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c24b1889",
   "metadata": {},
   "source": [
    "## Load WebPage Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "36f5198f",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import (\n",
    "    GPTVectorStoreIndex,\n",
    "    ServiceContext,\n",
    "    LLMPredictor,\n",
    "    SimpleWebPageReader,\n",
    ")\n",
    "\n",
    "loader = SimpleWebPageReader(html_to_text=True)\n",
    "documents = loader.load_data(urls=[\"https://milvus.io/docs/overview.md\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff637b81",
   "metadata": {},
   "source": [
    "## Build Index and Get Query Engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "c3753032",
   "metadata": {},
   "outputs": [],
   "source": [
    "index = GPTVectorStoreIndex.from_documents(\n",
    "    documents,\n",
    "    service_context=ServiceContext.from_defaults(\n",
    "        llm_predictor=LLMPredictor(cache=gptcache_obj)\n",
    "    ),\n",
    ")\n",
    "query_engine = index.as_query_engine()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "200c03c5",
   "metadata": {},
   "source": [
    "## Query"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "b933dd69",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Milvus is an open source vector database for building and managing large-scale AI applications. It provides fast and accurate vector search capabilities, enabling users to quickly search and retrieve vectors from large datasets.\n",
      "CPU times: user 1.21 s, sys: 206 ms, total: 1.42 s\n",
      "Wall time: 9.69 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "print(query_engine.query(\"What is milvus?\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "2dae9741",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Milvus is an open source vector database for building and managing large-scale AI applications. It provides fast and accurate vector search capabilities, enabling users to quickly search and retrieve vectors from large datasets.\n",
      "CPU times: user 784 ms, sys: 17.8 ms, total: 801 ms\n",
      "Wall time: 940 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "print(query_engine.query(\"What's milvus?\"))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
